tutorials/021 - Global Configurations.ipynb (607 lines of code) (raw):

{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "[![AWS SDK for pandas](_static/logo.png \"AWS SDK for pandas\")](https://github.com/aws/aws-sdk-pandas)\n", "\n", "# 21 - Global Configurations\n", "\n", "[awswrangler](https://github.com/aws/aws-sdk-pandas) has two ways to set global configurations that will override the regular default arguments configured in functions signatures.\n", "\n", "- **Environment variables**\n", "- **wr.config**\n", "\n", "*P.S. Check the [function API doc](https://aws-sdk-pandas.readthedocs.io/en/3.11.0/api.html) to see if your function has some argument that can be configured through Global configurations.*\n", "\n", "*P.P.S. One exception to the above mentioned rules is the `botocore_config` property. It cannot be set through environment variables\n", "but only via `wr.config`. It will be used as the `botocore.config.Config` for all underlying `boto3` calls.\n", "The default config is `botocore.config.Config(retries={\"max_attempts\": 5}, connect_timeout=10, max_pool_connections=10)`.\n", "If you only want to change the retry behavior, you can use the environment variables `AWS_MAX_ATTEMPTS` and `AWS_RETRY_MODE`.\n", "(see [Boto3 documentation](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html#using-environment-variables))*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Environment Variables" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "env: WR_DATABASE=default\n", "env: WR_CTAS_APPROACH=False\n", "env: WR_MAX_CACHE_SECONDS=900\n", "env: WR_MAX_CACHE_QUERY_INSPECTIONS=500\n", "env: WR_MAX_REMOTE_CACHE_ENTRIES=50\n", "env: WR_MAX_LOCAL_CACHE_ENTRIES=100\n" ] } ], "source": [ "%env WR_DATABASE=default\n", "%env WR_CTAS_APPROACH=False\n", "%env WR_MAX_CACHE_SECONDS=900\n", "%env WR_MAX_CACHE_QUERY_INSPECTIONS=500\n", "%env WR_MAX_REMOTE_CACHE_ENTRIES=50\n", "%env WR_MAX_LOCAL_CACHE_ENTRIES=100" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import botocore\n", "\n", "import awswrangler as wr" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>foo</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>1</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " foo\n", "0 1" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "wr.athena.read_sql_query(\"SELECT 1 AS FOO\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Resetting" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# Specific\n", "wr.config.reset(\"database\")\n", "# All\n", "wr.config.reset()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## wr.config" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "wr.config.database = \"default\"\n", "wr.config.ctas_approach = False\n", "wr.config.max_cache_seconds = 900\n", "wr.config.max_cache_query_inspections = 500\n", "wr.config.max_remote_cache_entries = 50\n", "wr.config.max_local_cache_entries = 100\n", "# Set botocore.config.Config that will be used for all boto3 calls\n", "wr.config.botocore_config = botocore.config.Config(\n", " retries={\"max_attempts\": 10}, connect_timeout=20, max_pool_connections=20\n", ")" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>foo</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>1</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " foo\n", "0 1" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "wr.athena.read_sql_query(\"SELECT 1 AS FOO\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Visualizing" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>name</th>\n", " <th>Env. Variable</th>\n", " <th>type</th>\n", " <th>nullable</th>\n", " <th>enforced</th>\n", " <th>configured</th>\n", " <th>value</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>catalog_id</td>\n", " <td>WR_CATALOG_ID</td>\n", " <td>&lt;class 'str'&gt;</td>\n", " <td>True</td>\n", " <td>False</td>\n", " <td>False</td>\n", " <td>None</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>concurrent_partitioning</td>\n", " <td>WR_CONCURRENT_PARTITIONING</td>\n", " <td>&lt;class 'bool'&gt;</td>\n", " <td>False</td>\n", " <td>False</td>\n", " <td>False</td>\n", " <td>None</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>ctas_approach</td>\n", " <td>WR_CTAS_APPROACH</td>\n", " <td>&lt;class 'bool'&gt;</td>\n", " <td>False</td>\n", " <td>False</td>\n", " <td>True</td>\n", " <td>False</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>database</td>\n", " <td>WR_DATABASE</td>\n", " <td>&lt;class 'str'&gt;</td>\n", " <td>True</td>\n", " <td>False</td>\n", " <td>True</td>\n", " <td>default</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>max_cache_query_inspections</td>\n", " <td>WR_MAX_CACHE_QUERY_INSPECTIONS</td>\n", " <td>&lt;class 'int'&gt;</td>\n", " <td>False</td>\n", " <td>False</td>\n", " <td>True</td>\n", " <td>500</td>\n", " </tr>\n", " <tr>\n", " <th>5</th>\n", " <td>max_cache_seconds</td>\n", " <td>WR_MAX_CACHE_SECONDS</td>\n", " <td>&lt;class 'int'&gt;</td>\n", " <td>False</td>\n", " <td>False</td>\n", " <td>True</td>\n", " <td>900</td>\n", " </tr>\n", " <tr>\n", " <th>6</th>\n", " <td>max_remote_cache_entries</td>\n", " <td>WR_MAX_REMOTE_CACHE_ENTRIES</td>\n", " <td>&lt;class 'int'&gt;</td>\n", " <td>False</td>\n", " <td>False</td>\n", " <td>True</td>\n", " <td>50</td>\n", " </tr>\n", " <tr>\n", " <th>7</th>\n", " <td>max_local_cache_entries</td>\n", " <td>WR_MAX_LOCAL_CACHE_ENTRIES</td>\n", " <td>&lt;class 'int'&gt;</td>\n", " <td>False</td>\n", " <td>False</td>\n", " <td>True</td>\n", " <td>100</td>\n", " </tr>\n", " <tr>\n", " <th>8</th>\n", " <td>s3_block_size</td>\n", " <td>WR_S3_BLOCK_SIZE</td>\n", " <td>&lt;class 'int'&gt;</td>\n", " <td>False</td>\n", " <td>True</td>\n", " <td>False</td>\n", " <td>None</td>\n", " </tr>\n", " <tr>\n", " <th>9</th>\n", " <td>workgroup</td>\n", " <td>WR_WORKGROUP</td>\n", " <td>&lt;class 'str'&gt;</td>\n", " <td>False</td>\n", " <td>True</td>\n", " <td>False</td>\n", " <td>None</td>\n", " </tr>\n", " <tr>\n", " <th>10</th>\n", " <td>chunksize</td>\n", " <td>WR_CHUNKSIZE</td>\n", " <td>&lt;class 'int'&gt;</td>\n", " <td>False</td>\n", " <td>True</td>\n", " <td>False</td>\n", " <td>None</td>\n", " </tr>\n", " <tr>\n", " <th>11</th>\n", " <td>s3_endpoint_url</td>\n", " <td>WR_S3_ENDPOINT_URL</td>\n", " <td>&lt;class 'str'&gt;</td>\n", " <td>True</td>\n", " <td>True</td>\n", " <td>True</td>\n", " <td>None</td>\n", " </tr>\n", " <tr>\n", " <th>12</th>\n", " <td>athena_endpoint_url</td>\n", " <td>WR_ATHENA_ENDPOINT_URL</td>\n", " <td>&lt;class 'str'&gt;</td>\n", " <td>True</td>\n", " <td>True</td>\n", " <td>True</td>\n", " <td>None</td>\n", " </tr>\n", " <tr>\n", " <th>13</th>\n", " <td>sts_endpoint_url</td>\n", " <td>WR_STS_ENDPOINT_URL</td>\n", " <td>&lt;class 'str'&gt;</td>\n", " <td>True</td>\n", " <td>True</td>\n", " <td>True</td>\n", " <td>None</td>\n", " </tr>\n", " <tr>\n", " <th>14</th>\n", " <td>glue_endpoint_url</td>\n", " <td>WR_GLUE_ENDPOINT_URL</td>\n", " <td>&lt;class 'str'&gt;</td>\n", " <td>True</td>\n", " <td>True</td>\n", " <td>True</td>\n", " <td>None</td>\n", " </tr>\n", " <tr>\n", " <th>15</th>\n", " <td>redshift_endpoint_url</td>\n", " <td>WR_REDSHIFT_ENDPOINT_URL</td>\n", " <td>&lt;class 'str'&gt;</td>\n", " <td>True</td>\n", " <td>True</td>\n", " <td>True</td>\n", " <td>None</td>\n", " </tr>\n", " <tr>\n", " <th>16</th>\n", " <td>kms_endpoint_url</td>\n", " <td>WR_KMS_ENDPOINT_URL</td>\n", " <td>&lt;class 'str'&gt;</td>\n", " <td>True</td>\n", " <td>True</td>\n", " <td>True</td>\n", " <td>None</td>\n", " </tr>\n", " <tr>\n", " <th>17</th>\n", " <td>emr_endpoint_url</td>\n", " <td>WR_EMR_ENDPOINT_URL</td>\n", " <td>&lt;class 'str'&gt;</td>\n", " <td>True</td>\n", " <td>True</td>\n", " <td>True</td>\n", " <td>None</td>\n", " </tr>\n", " <tr>\n", " <th>19</th>\n", " <td>dynamodb_endpoint_url</td>\n", " <td>WR_DYNAMODB_ENDPOINT_URL</td>\n", " <td>&lt;class 'str'&gt;</td>\n", " <td>True</td>\n", " <td>True</td>\n", " <td>True</td>\n", " <td>None</td>\n", " </tr>\n", " <tr>\n", " <th>20</th>\n", " <td>secretsmanager_endpoint_url</td>\n", " <td>WR_SECRETSMANAGER_ENDPOINT_URL</td>\n", " <td>&lt;class 'str'&gt;</td>\n", " <td>True</td>\n", " <td>True</td>\n", " <td>True</td>\n", " <td>None</td>\n", " </tr>\n", " <tr>\n", " <th>21</th>\n", " <td>timestream_endpoint_url</td>\n", " <td>WR_TIMESTREAM_ENDPOINT_URL</td>\n", " <td>&lt;class 'str'&gt;</td>\n", " <td>True</td>\n", " <td>True</td>\n", " <td>True</td>\n", " <td>None</td>\n", " </tr>\n", " <tr>\n", " <th>22</th>\n", " <td>botocore_config</td>\n", " <td>WR_BOTOCORE_CONFIG</td>\n", " <td>&lt;class 'botocore.config.Config'&gt;</td>\n", " <td>True</td>\n", " <td>False</td>\n", " <td>True</td>\n", " <td>&lt;botocore.config.Config object at 0x14f313e50&gt;</td>\n", " </tr>\n", " <tr>\n", " <th>23</th>\n", " <td>verify</td>\n", " <td>WR_VERIFY</td>\n", " <td>&lt;class 'str'&gt;</td>\n", " <td>True</td>\n", " <td>False</td>\n", " <td>True</td>\n", " <td>None</td>\n", " </tr>\n", " <tr>\n", " <th>24</th>\n", " <td>address</td>\n", " <td>WR_ADDRESS</td>\n", " <td>&lt;class 'str'&gt;</td>\n", " <td>True</td>\n", " <td>False</td>\n", " <td>False</td>\n", " <td>None</td>\n", " </tr>\n", " <tr>\n", " <th>25</th>\n", " <td>redis_password</td>\n", " <td>WR_REDIS_PASSWORD</td>\n", " <td>&lt;class 'str'&gt;</td>\n", " <td>True</td>\n", " <td>False</td>\n", " <td>False</td>\n", " <td>None</td>\n", " </tr>\n", " <tr>\n", " <th>26</th>\n", " <td>ignore_reinit_error</td>\n", " <td>WR_IGNORE_REINIT_ERROR</td>\n", " <td>&lt;class 'bool'&gt;</td>\n", " <td>True</td>\n", " <td>False</td>\n", " <td>False</td>\n", " <td>None</td>\n", " </tr>\n", " <tr>\n", " <th>27</th>\n", " <td>include_dashboard</td>\n", " <td>WR_INCLUDE_DASHBOARD</td>\n", " <td>&lt;class 'bool'&gt;</td>\n", " <td>True</td>\n", " <td>False</td>\n", " <td>False</td>\n", " <td>None</td>\n", " </tr>\n", " <tr>\n", " <th>28</th>\n", " <td>log_to_driver</td>\n", " <td>WR_LOG_TO_DRIVER</td>\n", " <td>&lt;class 'bool'&gt;</td>\n", " <td>True</td>\n", " <td>False</td>\n", " <td>False</td>\n", " <td>None</td>\n", " </tr>\n", " <tr>\n", " <th>29</th>\n", " <td>object_store_memory</td>\n", " <td>WR_OBJECT_STORE_MEMORY</td>\n", " <td>&lt;class 'int'&gt;</td>\n", " <td>True</td>\n", " <td>False</td>\n", " <td>False</td>\n", " <td>None</td>\n", " </tr>\n", " <tr>\n", " <th>30</th>\n", " <td>cpu_count</td>\n", " <td>WR_CPU_COUNT</td>\n", " <td>&lt;class 'int'&gt;</td>\n", " <td>True</td>\n", " <td>False</td>\n", " <td>False</td>\n", " <td>None</td>\n", " </tr>\n", " <tr>\n", " <th>31</th>\n", " <td>gpu_count</td>\n", " <td>WR_GPU_COUNT</td>\n", " <td>&lt;class 'int'&gt;</td>\n", " <td>True</td>\n", " <td>False</td>\n", " <td>False</td>\n", " <td>None</td>\n", " </tr>\n", " </tbody>\n", "</table>" ], "text/plain": [ "<awswrangler._config._Config at 0x1376ece80>" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "wr.config" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.13" }, "vscode": { "interpreter": { "hash": "bd595004b250e5f4145a0d632609b0d8f97d1ccd278d58fafd6840c0467021f9" } } }, "nbformat": 4, "nbformat_minor": 4 }